Last updated: 2017-03-05
Bolker et al. ( ???): Generalized linear mixed models: a practical guide for ecology and evolution
Approachable:
Bayesian:
MCMC:
Technical:
Very Important:
Moderately Important:
Minimally Important:
Our tests (so far) assume independence of samples.
Lots of data have structure at additional levels beyond the main variable(s) of interest
We need a method to model that correlation structure.
Explicitly by design:
Implicitly as a side effect:
Multilevel / hierarchical / mixed / repeated measures models
Fixed effects:
Random effects
Also see: http://andrewgelman.com/2005/01/25/why_i_dont_use/
Units are grouped at different levels.
\[Y = \beta_0 + \beta_1 X_1 + ... + \beta_n X_n + \mbox{Random}\]
Where \(\mbox{Random}\) can be:
Each bird gets its own mean value for a maneuver (across 5 trials). Maneuver is analyzed as the mean difference between each maneuver.
Conceptual complexity
Computational complexity
Among the options for R packages:
nlme: Oldest, restricted to a subset of models (linear and non-linear Gaussian), for us the place to startlme4: Newer (but same authors as nlme), can handle more complex models with crossed random effects, pedigrees, non-Gaussian response, etc., more flexible distributions of response variables, but harder to test hypothesesMCMCglmm, rstan, rjags: Fit Bayesian multilevel models via a variety of MCMC samplersIntraclass correlation coefficient:
Lengths <- data.frame(Fem_Len = c(M$Len1, M$Len2),
MouseID = factor(c(M$MouseID, M$MouseID)))
Lengths <- Lengths %>% drop_na()
fm <- anova(lm(Fem_Len ~ MouseID, data = Lengths))
print(fm, digits = 5)
## Analysis of Variance Table ## ## Response: Fem_Len ## Df Sum Sq Mean Sq F value Pr(>F) ## MouseID 118 75.023 0.63579 3510.8 < 2.2e-16 *** ## Residuals 119 0.022 0.00018 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
var_a <- (fm$'Mean Sq'[1] - fm$'Mean Sq'[2]) / 2 var_a / (var_a + fm$'Mean Sq'[2])
## [1] 0.9994305
lme()library(nlme)
fm_lme <- lme(Fem_Len ~ 1,
random = ~ 1 | MouseID,
data = Lengths,
method = "ML")
head(coef(fm_lme))
## (Intercept) ## 46001 16.21499 ## 46002 17.08474 ## 46003 16.77983 ## 46004 16.64986 ## 46012 16.85481 ## 46013 15.99005
lme()Lengths %>% group_by(MouseID) %>% summarize(mean(Fem_Len)) %>% head()
## # A tibble: 6 × 2 ## MouseID `mean(Fem_Len)` ## <fctr> <dbl> ## 1 46001 16.215 ## 2 46002 17.085 ## 3 46003 16.780 ## 4 46004 16.650 ## 5 46012 16.855 ## 6 46013 15.990
lme()(varcomps <- VarCorr(fm_lme))
## MouseID = pdLogChol(1) ## Variance StdDev ## (Intercept) 0.3151310080 0.56136531 ## Residual 0.0001810924 0.01345706
var.among <- as.numeric(varcomps[1, 1]) var.within <- as.numeric(varcomps[2, 1]) var.among / (var.among + var.within)
## [1] 0.9994257
Note: very close but not numerically equivalent to the anova() version. Models are slightly different.
Seasonal patterns in testis mass in the squid Loligo. Testis mass predicted by dorsal mantle length, month, and DML X month.
Squid dataS <- read_excel("../data/Squid.xlsx")
S <- S %>% mutate(Month = factor(Month))
str(S)
## Classes 'tbl_df', 'tbl' and 'data.frame': 768 obs. of 5 variables: ## $ Specimen : num 1017 1034 1070 1070 1019 ... ## $ Year : num 1991 1990 1990 1990 1990 ... ## $ Month : Factor w/ 12 levels "1","2","3","4",..: 2 9 12 11 8 10 5 7 7 7 ... ## $ DML : num 136 144 108 130 121 117 133 105 109 97 ... ## $ Testis_Mass: num 0.006 0.008 0.008 0.011 0.012 0.012 0.013 0.015 0.017 0.017 ...
# Check for missing data sum(!complete.cases(S))
## [1] 0
Testis mass modeled by DML, Month, and the DML X Month interaction:
fm1 <- lm(Testis_Mass ~ DML + Month + DML:Month, data = S)
Extract the residuals and fitted values:
S$Res1 <- residuals(fm1) S$Fitted <- fitted.values(fm1)
What we need:
DML (here, more spread with larger values)Solution:
nlme::gls()library(nlme) variance_struct <- varPower(form = ~ DML | Month)
DML on a per-Month basis.fm2 <- gls(Testis_Mass ~ DML * Month, data = S,
weights = variance_struct)
plot(fm2)
library(car)
## ## Attaching package: 'car'
## The following object is masked from 'package:dplyr': ## ## recode
Anova(fm2, type = "III")
## Analysis of Deviance Table (Type III tests) ## ## Response: Testis_Mass ## Df Chisq Pr(>Chisq) ## (Intercept) 1 12.830 0.0003411 *** ## DML 1 97.686 < 2.2e-16 *** ## Month 11 131.398 < 2.2e-16 *** ## DML:Month 11 263.769 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
S$Res2 <- residuals(fm2, type = "normalized")
## coef(fm2) ## (Intercept) -4.780826964 ## DML 0.046586983 ## Month2 1.334893906 ## Month3 3.661220332 ## Month4 3.410126677 ## Month5 3.953042298 ## Month6 -0.666390313 ## Month7 3.669116396 ## Month8 3.241297524 ## Month9 -1.646885027 ## Month10 0.520272100 ## Month11 -1.742198739 ## Month12 -0.957170988 ## DML:Month2 -0.007741310 ## DML:Month3 -0.020683250 ## DML:Month4 -0.023433233 ## DML:Month5 -0.028570828 ## DML:Month6 -0.012384582 ## DML:Month7 -0.036971343 ## DML:Month8 -0.036261427 ## DML:Month9 -0.002745741 ## DML:Month10 -0.008962055 ## DML:Month11 0.007326031 ## DML:Month12 0.007205327
Data with hierarchical structure:
We could just take a means of each block and use those data for ANOVA.
Collected by National Institute for Coastal and Marine Management (RIKZ)
RIKZ <- read_excel("../data/RIKZ.xlsx")
glimpse(RIKZ)
## Observations: 45 ## Variables: 5 ## $ Sample <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16... ## $ Richness <dbl> 11, 10, 13, 11, 10, 8, 9, 8, 19, 17, 6, 1, 4, 3, 3, 1... ## $ Exposure <dbl> 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 11, 11, 11, 11, 11... ## $ NAP <dbl> 0.045, -1.036, -1.336, 0.616, -0.684, 1.190, 0.820, 0... ## $ Beach <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4,...
Richness: Number of speciesNAP: Height above mean tidal levelExposure: Index composed of wave action, length of the surf zone, slope, grain size, and the depth of the anaerobic layerConvert Beach to factor.
RIKZ <- RIKZ %>% mutate(Beach = factor(Beach))
Sites are the level of the observation.
RIKZ %>% group_by(Beach) %>% tally()
## # A tibble: 9 × 2 ## Beach n ## <fctr> <int> ## 1 1 5 ## 2 2 5 ## 3 3 5 ## 4 4 5 ## 5 5 5 ## 6 6 5 ## 7 7 5 ## 8 8 5 ## 9 9 5
5 observations at each of 9 beaches
NAPExposureEach Beach only has 1 value of Exposure.
fm1 <- lm(Richness ~ NAP + Beach, data = RIKZ) Anova(fm1, type = "III")
## Anova Table (Type III tests) ## ## Response: Richness ## Sum Sq Df F value Pr(>F) ## (Intercept) 466.37 1 49.8039 3.224e-08 *** ## NAP 230.66 1 24.6322 1.793e-05 *** ## Beach 416.37 8 5.5581 0.0001441 *** ## Residuals 327.74 35 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Maybe we are not interested in knowing the exact nature of the beach effect.
Allow a different mean richness per beach:
fm2 <- lme(Richness ~ NAP, random = ~ 1 | Beach, data = RIKZ)
plot(fm2)
## Linear mixed-effects model fit by REML ## Data: RIKZ ## AIC BIC logLik ## 247.4802 254.525 -119.7401 ## ## Random effects: ## Formula: ~1 | Beach ## (Intercept) Residual ## StdDev: 2.944065 3.05977 ## ## Fixed effects: Richness ~ NAP ## Value Std.Error DF t-value p-value ## (Intercept) 6.581893 1.0957618 35 6.006682 0 ## NAP -2.568400 0.4947246 35 -5.191574 0 ## Correlation: ## (Intr) ## NAP -0.157 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -1.4227495 -0.4848006 -0.1576462 0.2518966 3.9793918 ## ## Number of Observations: 45 ## Number of Groups: 9
Generate predicted values for overall population level estimates and within beach estimates.
# Overall RIKZ$Pred0 <- fitted(fm2, level = 0) # Within beach RIKZ$Pred1 <- fitted(fm2, level = 1)
Allow each Beach to have its own NAP to Richness linear relationship:
fm3 <- lme(Richness ~ NAP, random = ~ NAP | Beach, data = RIKZ)
Now the NAP to Richness relationship can vary for each Beach.
Model formula for random effects:
\[\sim \mbox{Slope}~|~\mbox{Intercept}\]
summary(fm3)
## Linear mixed-effects model fit by REML ## Data: RIKZ ## AIC BIC logLik ## 244.3839 254.9511 -116.1919 ## ## Random effects: ## Formula: ~NAP | Beach ## Structure: General positive-definite, Log-Cholesky parametrization ## StdDev Corr ## (Intercept) 3.549064 (Intr) ## NAP 1.714963 -0.99 ## Residual 2.702820 ## ## Fixed effects: Richness ~ NAP ## Value Std.Error DF t-value p-value ## (Intercept) 6.588706 1.264761 35 5.209448 0e+00 ## NAP -2.830028 0.722940 35 -3.914610 4e-04 ## Correlation: ## (Intr) ## NAP -0.819 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -1.8213326 -0.3411043 -0.1674617 0.1921216 3.0397049 ## ## Number of Observations: 45 ## Number of Groups: 9
Generate predicted values for overall population level estimates and within beach estimates.
# Overall RIKZ$Pred0 <- fitted(fm3, level = 0) # Within beach RIKZ$Pred1 <- fitted(fm3, level = 1)
## (Intercept) NAP ## 1 8.421057 -3.656268 ## 2 12.363497 -5.536814 ## 3 3.806643 -1.505716 ## 4 3.562422 -1.385960 ## 5 11.200150 -5.137323 ## 6 4.426279 -1.775675 ## 7 4.082944 -1.644329 ## 8 5.099895 -2.106852 ## 9 6.335471 -2.721314
Sequential measurements of the same individual / plot / organism, etc. will result in within-subject correlation.
Distance from the sella turcica to the pterygomaxillary fissure in growing children.
O <- read_excel("../data/Ortho.xlsx")
O <- O %>% mutate(Subject = factor(Subject),
Sex = factor(Sex))
glimpse(O)
## Observations: 108 ## Variables: 4 ## $ Distance <dbl> 26.0, 25.0, 29.0, 31.0, 21.5, 22.5, 23.0, 26.5, 23.0,... ## $ Age <dbl> 8, 10, 12, 14, 8, 10, 12, 14, 8, 10, 12, 14, 8, 10, 1... ## $ Subject <fctr> M01, M01, M01, M01, M02, M02, M02, M02, M03, M03, M0... ## $ Sex <fctr> Male, Male, Male, Male, Male, Male, Male, Male, Male...
O %>% group_by(Sex) %>% tally()
## # A tibble: 2 × 2 ## Sex n ## <fctr> <int> ## 1 Female 44 ## 2 Male 64
range(O$Age)
## [1] 8 14
Potthoff, R. F. and Roy, S. N. (1964) A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika 51: 313-326.
Age and Sex (like an ANCOVA model).Age by Distance relationship (slope) to vary for each Subject, which is also allowed to have its own mean.fm2 <- lme(Distance ~ Sex + Age, random = ~ Age | Subject,
method = "ML",
data = O)
# Overall O$Pred0 <- fitted(fm2, level = 0) # Age within Subject O$Pred1 <- fitted(fm2, level = 1)
## Linear mixed-effects model fit by maximum likelihood ## Data: O ## AIC BIC logLik ## 446.8352 465.6101 -216.4176 ## ## Random effects: ## Formula: ~Age | Subject ## Structure: General positive-definite, Log-Cholesky parametrization ## StdDev Corr ## (Intercept) 2.6447350 (Intr) ## Age 0.2149243 -0.76 ## Residual 1.3100397 ## ## Fixed effects: Distance ~ Sex + Age ## Value Std.Error DF t-value p-value ## (Intercept) 15.489709 0.9328629 80 16.604486 0.0000 ## SexMale 2.145491 0.7391991 25 2.902453 0.0076 ## Age 0.660185 0.0709131 80 9.309772 0.0000 ## Correlation: ## (Intr) SexMal ## SexMale -0.470 ## Age -0.792 0.000 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -3.16563246 -0.45463448 0.01446427 0.44559465 3.90045094 ## ## Number of Observations: 108 ## Number of Groups: 27
Anova(fm2, type = "III")
## Analysis of Deviance Table (Type III tests) ## ## Response: Distance ## Chisq Df Pr(>Chisq) ## (Intercept) 283.5863 1 < 2.2e-16 *** ## Sex 8.6649 1 0.003244 ** ## Age 89.1482 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fm1 <- lme(Distance ~ Sex + Age, random = ~ 1 | Subject,
method = "ML",
data = O)
# Overall
O$Pred0 <- fitted(fm1, level = 0)
# Age within Subject
O$Pred1 <- fitted(fm1, level = 1)
anova(fm1, fm2)
## Model df AIC BIC logLik Test L.Ratio p-value ## fm1 1 5 444.8565 458.2671 -217.4282 ## fm2 2 7 446.8352 465.6101 -216.4176 1 vs 2 2.021324 0.364
Adapted from McElreath ( ???)
No.
Watch Lecture 08-4